Skip to content

Advanced Regular Expressions

https://github.com/ziishaned/learn-regex

Sample Problems

Problem 1

Which of the following strings match the regular expression pattern "^w{3}.([a-z0-9]([-a-z0-9]{0,61}[a-z0-9])+.)+[a-z0-9][-a-z0-9]{0,61}[a-z0-9]" ?

  1. www.google.com
  2. www.-petsmart.com
  3. www.edu-.ro
  4. www.google.co.in
  5. www.examples.c.net
  6. www.edu.training.computer-science.org
  7. www.everglades_holidaypark.com

Solution: This Regular Expression matches a domain name used to access web sites.

RE starts with the subdomain www, continues with a number of names of domains, separated by a dot (Top-level domain (TLD), Second-level domain (SLD), Third-level domain, and so on).

The name of a domain contains only small letters, digits and hyphen. The name can’t begin and can’t finish with a hyphen character. The length of the domain’s name is minimum 2 and maximum 63 characters.

^ : the string starts with www , followed by a dot character;

[a-z0-9] : the first and the last character of the domain's name can be only a small letter or a digit;

[-a-z0-9]{0,61} : the next characters can be small letters, digits or a hyphen character. Maximum 61 characters;

The last sequence [a-z0-9][-a-z0-9]{0,61}[a-z0-9] is for the Top-Level domain, which is not followed by a dot.

The strings that are represented by this pattern are 1, 4 and 6.

Problem 2

Write a regular expression describing a set of strings formed to the following rules:

  1. Contain only lowercase letters of the English alphabet and the character '.';
  2. Start and end with the same letter;
  3. Contain a sequence of at least one and at most 3 vowels, separated by zero or more characters '.' of a sequence consisting of at least one consonant.

Solution:

The Regular Expression is:

([a-z])[a,e,i,o,u]{1,3}.*[b-df-hj-np-tv-z]+(\1)

([a-z]) represents the group number 1 that captures the firs letter;

\1 is the number of the group that appears at the end of the string;

[a,e,i,o,u]{1,3} describes sequence of one to three vowels;

.* the character ‘.’ appears zero to more times;

[b-df-hj-np-tv-z]+ a sequence of consonants, at least one consonant.